-
-
Notifications
You must be signed in to change notification settings - Fork 377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RzShell: refactor string, regex and byte search #4762
Conversation
0c558c2
to
b59292d
Compare
This comment was marked as resolved.
This comment was marked as resolved.
Most calls to process_one_string() never decode a valid string. Before it allocated a complete buffer on the heap nontheless. Even if it freed it after one iteration. This is prevented now, by first decoding onto the stack and then continues on the heap when the string is reasonably long.
Going to squash commits into reasonable parts and push to |
@Rot127 please open a new PR with that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An immense amount of work and way better test coverage. While there are still things that could be improved, I believe they could go in separate PRs to not block this PR anymore.
Thus, LGTM. Let's merge it once green and has better history! Kudos!
@@ -99,7 +99,7 @@ jobs: | |||
os: ubuntu-22.04 | |||
build_system: meson | |||
compiler: gcc-12 | |||
cflags: "-DASAN=1 -DRZ_ASSERT_STDOUT=1 -ftrivial-auto-var-init=pattern -funsigned-char" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Open an issue about it also
@@ -31,6 +33,7 @@ typedef enum { | |||
RZ_STRING_ENC_EBCDIC_US = 's', | |||
RZ_STRING_ENC_EBCDIC_ES = 't', | |||
RZ_STRING_ENC_GUESS = 'g', | |||
RZ_STRING_ENC_SETTINGS = 'S', ///< Use str.encoding. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wargio you never answered this one. It's okay for now, but would be nice to separate.
#include <rz_list.h> | ||
#include <rz_th.h> | ||
|
||
#define RZ_SEARCH_AES_LENGTH 40 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be in a separate file, I think, but is fine for this PR, could be done afterwards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will be. With the AES search refactored. @wargio already moved it into a separted file before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep this internal. no real need to move this, and since i want to refactor more that type of search with new features, i think it can be ok to be there for now.
Close for PR with cleaned up history. Superseded by #4919 |
Your checklist for this pull request
Detailed description
Changes made
/
to/z
./xr
.ps
psu
alias forps utf8
str.search.max_uni_blocks
- Effectively a metric the user should not know about; adds too much complexity.str.search.max_threads
->search.max_threads
- This is a general setting for the search now.str.search.raw_alignment
->search.str.raw_alignment
- Unify settings (only used for RzBin search.).str.search.encoding
->str.encoding
- Valid for all string interpretations.str.search.min_length
->search.str.min_length
- Unify settings.str.search.buffer_size
->search.str.max_length
- Unify settings.str.search.max_region_size
->search.str.max_region_size
- Unify settings.str.search.check_ascii_freq
->search.str.check_ascii_freq
- Unify settings./!
- Because the command modifiers are not properly handled in RzShell yet and the advantage of this one is dubious (IMHO)./f
- Modifier and obsolete, because search is dispatched into threads./b
- Modifier and obsolete, because search is dispatched into threads./+
- Because no idea what it does. Seems not particular useful./e
- Replaced with regex search in bytes and string search./w
- All Unicode is searched now properly with/z
.RzStrEscOptions
were inconsistently used.E.g. show_asciidot (replace non-printable ascii with dot) was ignored for \n, \t etc.
\U00hhhhhh
. All other non-printable bytes are escaped with\xhh
. There are still some exceptions (when legacy escape functions are used) but most places are ok now./Uhhhhhh
(if not requested otherwise by the user) and invalid code points to/xhh
.TODO Overview
What happens here:
Slowly copying changes from https://github.com/Rot127/rizin/tree/rz-search-reference (which is #4742 with some comments already addressed).
Will resend to
fuzz-dist
when all the tests pass here.Stuff to do (without any order)
search IO
API which doesn't require knowledge of search spaces (whatrz_search_run()
was before).search.align
shoudl default tosearch.alignment=asm.cpu_bits/8
.ctrl + c
RzSearchHit
to hold more complex data.find()
callback.Test plan
...
Closing issues
closes #4910